We will be creating a 5-year forecast of Annual Global Emissions of
Carbon Dioxide by using the data set
Annual Global Emissions of Carbon Dioxide from 1940 to 2023
and the Meta Prophet Forecasting System. The Prophet System is a tool
used for time series forecasting and will handle various components of
the time series data, such as trend, seasonality, etc.
Source of data set : Global Carbon Project; Expert(s) (Friedlingstein et al. (2023)).
The data is also is linked in here : https://www.statista.com/statistics/276629/global-co2-emissions/
We will apply the ‘prophet’ package within R to forecast our time series. To use it, we will install the package and open the functionality from this library later on.
After installation, with our data, we will now create a
data frame, making our Year column as
ds and Emissions In Billion Metric Tons as
y.
To ensure that the date formats of the ds column are
recognised at years (YYYY), we will convert the column into character
strings and then into our preferred date format. To confirm our data
that we imported has successfully become a dataframe,
class() function has been used.
library(readxl)
Annual_global_emissions_of_carbon_dioxide_1940_2023 <- read_excel("Annual-global-emissions-of-carbon-dioxide-1940-2023.xlsx")
GlobalCo2.df <- data.frame(ds=Annual_global_emissions_of_carbon_dioxide_1940_2023$Year,y=Annual_global_emissions_of_carbon_dioxide_1940_2023$`Emissions In Billion Metric Tons`)
GlobalCo2.df$ds <- as.Date(as.character(GlobalCo2.df$ds), format = "%Y")
class(GlobalCo2.df)
## [1] "data.frame"
Then, we will convert the data frame into a
time series object. Again, for confirmation that the
conversion has been successful, class() function has been
used.
GlobalCo2_ts <- ts(GlobalCo2.df$y, start = min(GlobalCo2.df$ds), frequency = 1)
GlobalCo2_ts
## Time Series:
## Start = -10881
## End = -10798
## Frequency = 1
## [1] 4.86 4.97 4.96 5.04 5.12 4.26 4.65 5.15 5.42 5.18 5.93 6.38
## [13] 6.47 6.65 6.79 7.44 7.93 8.19 8.42 8.85 9.39 9.41 9.75 10.27
## [25] 10.82 11.31 11.86 12.24 12.90 13.76 14.90 15.50 16.22 17.08 17.01 17.05
## [37] 17.99 18.49 19.06 19.60 19.48 19.02 18.87 18.99 19.64 20.31 20.61 21.25
## [49] 22.08 22.38 22.75 23.23 22.57 22.80 23.03 23.52 24.25 24.40 24.33 24.83
## [61] 25.50 25.67 26.25 27.65 28.62 29.59 30.61 31.50 32.04 31.49 33.31 34.44
## [73] 34.94 35.23 35.47 35.46 35.46 36.03 36.77 37.04 35.01 36.82 37.15 37.55
As discussed initially, we will now introduce the
prophet package here to plot a forecast of Global Emissions
of Carbon Dioxide with a projection of 5 years.
To plot this, we will make a future data frame to create future dates for the next 5 years to make a forecast which will then be plotted within the graph.
library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
When using the prophet function, it detects yearly
seasonality but disables weekly and daily seasonality as the function
did not detect this type of seasonality within our dataset.
m <- prophet(GlobalCo2.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
FutureDatesforNext5Years = prophet::make_future_dataframe(m, periods = 5, freq = "year")
ForecastForNext5Years = predict(m,FutureDatesforNext5Years)
plot(m, ForecastForNext5Years)
Here is the plot of the 5-year forecast of
Annual Global Carbon Dioxide Emissions as an interactive
graph using DyGraphs.
dyplot.prophet(m, ForecastForNext5Years)
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `select()` instead.
## ℹ The deprecated feature was likely used in the prophet package.
## Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
From our plotted 5-year forecast, it can be observed that the global emissions of Carbon Dioxide will continue to increase year-by-year, estimating Global Emissions of Carbon Dioxide will total to 41.51 Billion Metric Tons in 2028.
library(prophet)
prophet_plot_components(m, ForecastForNext5Years)
When observing our trend component, it shows us the direction and sized of change in carbon dioxide emissions over the years. The component appears as a mostly straight and positive linear line, indicating a consistent upward trajectory in global carbon dioxide emissions over the next 5 years. This means there is and will continue to be a persistent increase in Global Carbon Dioxide Emissions.
However, when observing the yearly seasonality graph, it shows us the fluctuations observed in the data over a year. The first 5 months shows huge fluctuations as it initially slightly increases then spikes downwards to -5, before largely spiking upwards and downwards again, indicating there is high level of seasonality for these months. In the next 7 months, the fluctuations start to stabilise and are of the roughly of the same magnitude.
We will carry out linear regression model of ‘Emissions’ on ‘Years’ to see the growth of our series.
model=lm(y~ds, GlobalCo2.df)
summary(model)
##
## Call:
## lm(formula = y ~ ds, data = GlobalCo2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2234 -0.9528 -0.2702 0.9027 3.2170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.444e+01 1.571e-01 91.92 <2e-16 ***
## ds 1.176e-03 1.597e-05 73.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.296 on 82 degrees of freedom
## Multiple R-squared: 0.9851, Adjusted R-squared: 0.9849
## F-statistic: 5421 on 1 and 82 DF, p-value: < 2.2e-16
From the summary of the linear regression model, we can see that the adjusted R-squared is 0.9849, which indicates the data has a great level of goodness of fit as the model explains 98.49% of the variance of global carbon dioxide emissions. This is because the adjusted R-squared value is close to 1.
The Multiple R-squared tells us the proportion of variance in the Emissions variable that is explained by Years Variable. The Multiple R-Squared value is 0.9851, meaning it is close to a perfect fit as it is close to the value of 1.
The p-value of <2.2e-16, which is considered low, shows the model is statistically significant and that there is a strong linear relationship between the years and carbon dioxide emissions.
The estimate ds value of 1.176e-03 represents beta 1 in the linear trend function and estimates that per year, the Global Emissions of CO2 increases by 1.17 Billion Metric Tons.
We will now plot Emissions vs Year and
apply an estimated regression line on this graph.
plot(GlobalCo2.df$ds,GlobalCo2.df$y, type="p", xlab ="Year", ylab="Emissions (in Billion Metric Tons)")
points(GlobalCo2.df$ds,fitted(model), type ="l", col="blue")
When observing the graph above, it is observed that the Global Emissions of Carbon Dioxide follows a positive linear pattern as it steadily escalates over the course of years. There are some notable small dips in the data during 1945, 1980s, 2008 and 2020. This is due to global events, such as WW2, recessions and COVID-19 Outbreak (due to lockdowns and restrictions). The largest annual reduction was the end of WW2 in 1945, when emissions of CO2 decreased by 17%.
Additionally, we will plot the fitted values against the standardized residuals to assess model assumptions.
plot(fitted(model), rstandard(model), xlab="Fitted Values", ylab = "Standardized Residuals", main = "Residuals vs Fitted", type = "l")
abline(a=0, b=0)
When observing the Residuals vs Fitted graph above, there is an initial sharp decrease with slight fluctuations in the relationship between the residuals and fitted values and then there is a heavy fluctuation that is followed after this, with another noticable sharp decrease in the relationship again later on.
We can also look at creating a 5-year forecast of the
CO2 data set that is found with in R by typing CO2. The
CO2 data set shows the results of an experiment on cold
tolerance of grass. Results measure atmospheric concentrations of CO2 in
parts per million collected monthly from 1959 to 1997.
Source : Keeling, C. D. and Whorf, T. P., Scripps Institution of Oceanography (SIO), University of California, La Jolla, California USA 92093-0220.
We can apply our methods above and the Prophet Forecasting System to
the CO2 data set to plot a 5-year forecast and carry out an
analysis.
co2.df = data.frame(
ds=zoo::as.yearmon(time(co2)),
y=co2)
m2 <- prophet::prophet(co2.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
FutureDatesfor5Yrs = prophet::make_future_dataframe(m2, periods=20, freq="quarter")
Prediction = predict(m2, FutureDatesfor5Yrs)
plot(m2, Prediction)
dyplot.prophet(m2, Prediction)
From this plotted 5-year forecast, it can be observed that the
atmospheric concentrations of Carbon Dioxide will have a positive linear
relationship with constant fluctuations and will continue to increase
monthly over the period, estimating the atmospheric concentrations of
Carbon Dioxide being 370.38 parts per million in 2002. The data follows
the same trajectory as the
Annual Global Emissions of Carbon Dioxide forecast plot.
However, one difference is that the data in
Annual Global Emissions of Carbon Dioxide data set is
measured in years where as in the CO2 data, it is measured
in months. Therefore, there are many more data points in the
CO2 forecast plot. Another difference is that the data sets
are measured in different units.
library(prophet)
prophet_plot_components(m2, Prediction)
When observing our trend component, it shows us the direction and
sized of change in atmospheric concentrations of carbon dioxide over the
years. The component appears as a mostly straight and positive linear
line, indicating a consistent upward trajectory in atmospheric
concentrations of carbon dioxide over the next 5 years up to 2002. This
means there is and will continue to be a persistent increase in
atmospheric concentrations of carbon dioxide. The trend component for
CO2 data is similar to the trend component of the
Annual Global Emissions of Carbon Dioxide data.
However, when observing the yearly seasonality graph, it shows us the
fluctuations observed in the data over a year. The seasonality initially
dips slightly and then increases in fluctuations until the May. After
this, the seasonality decreases gradually until it reaches -5 just
before October. After October, it increases again to 0 but in slight
fluctuations. Therefore, the seasonality component for CO2
data is different to the seasonality component of the
Annual Global Emissions of Carbon Dioxide data.
We will carry out linear regression of ‘CO2 concentrations’ on ‘Time’ to see the growth of our series.
model2=lm(y~ds, co2.df)
summary(model2)
##
## Call:
## lm(formula = y ~ ds, data = co2.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0399 -1.9476 -0.0017 1.9113 6.5149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.250e+03 2.127e+01 -105.8 <2e-16 ***
## ds 1.308e+00 1.075e-02 121.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared: 0.9695, Adjusted R-squared: 0.9694
## F-statistic: 1.479e+04 on 1 and 466 DF, p-value: < 2.2e-16
From the summary of the linear regression model, we can see that the
adjusted R-squared is 0.9694, which indicates the data has a great level
of goodness of fit as the model explains 96.94% of the variance of
atmospheric concentrations of carbon dioxide, which is lower than the
Annual Global Emissions of CO2 data.
The Multiple R-squared tells us the proportion of variance in the
Concentrations variable that is explained by Time Variable. The Multiple
R-Squared value is 0.9695, meaning it is close to a perfect fit as it is
close to the value of 1, but is, again, lower than the
Annual Global Emissions of CO2 data.
The p-value of <2.2e-16, which is considered low, shows the model
is statistically significant and that there is a strong linear
relationship between the time and atmospheric concentrations of carbon
dioxide. This was the same for the
Annual Global Emissions of CO2 data.
The estimate ds value of 1.308e+00 represents beta 1 in the linear trend function and shows that per month, the Atmospheric Concentrations of Carbon Dioxide increases by 1.308 parts per million.
We will now plot CO2 Concentrations vs Time
and apply an estimated regression line on this graph.
plot(co2.df$ds,co2.df$y, type="p", xlab ="Year", ylab="CO2 Concentrations in Parts per Million")
points(co2.df$ds,fitted(model2), type ="l", col="blue")
When observing the graph above, it is observed that the Atmospheric
Concentrations of Carbon Dioxide follows a positive linear pattern as it
steadily escalates over the course of years. This is similar when
compared to Annual Global Emissions of Carbon Dioxide data.
However, there are many more data points considered in the graph above
as the CO2 data is in months, whereas the Annual Global
Emissions of Carbon Dioxide is in years.
Additionally, we will plot the fitted values against the standardized residuals.
plot(fitted(model2), rstandard(model2), xlab="Fitted Values", ylab = "Standardized Residuals", main = "Residuals vs Fitted", type = "l")
abline(a=0, b=0)
When observing the Residuals vs Fitted graph above, there is huge amounts of fluctuations in the relationship between the residuals and the fitted values.
When comparing both time series, there are noticeable similarities
but differences such as seasonality should not be ignored. It should be
noted that both datasets are measured over different time periods
(although having some overlap), and are measured in different units.
They are also not correlated as the CO2 data measures
atmospheric CO2 concentrations over months and the
Global Emissions of Carbon Dioxide measures CO2 emissions
globally over years and so are independent from each other.
However, although they are different data sets, there is a possible causal relationship between them for some period of time. Atmospheric CO2 concentration is influenced by many factors and so global emissions of carbon dioxide can contribute or influence to the increase in the atmospheric CO2 concentrations over time.
Although the data sets are not the same, they can be related in a way
that global emissions of carbon dioxide can impact the atmospheric CO2
concentrations, affecting the data of CO2 over times where
both data sets overlap in years as the CO2 data was
collected from 1959 to 1997 whereas the
Annual Global Emissions of Carbon Dioxide data was
collected from 1940 to 2023.